Access Global AI Models - Power Next-Gen Apps

From General to Specialized AI - All Models in One Platform

LLM Tools：

Model Comparison Cost Calculator Arena Open Source Models

Release Date

Input Price

Output Price

Filter

Service Provider

Classification

Capabilities

Context Length

208 models match the criteria

Release Date

Input Price

Output Price

Text generationMultilingual

Grok 4 Fast is a lightweight version of the large language model launched by xAI in 2025, focusing on high-speed inference and cost optimization. Its core features include: a token generation speed of 75 tokens per second (10 times faster than the standard version), a super-long context window of 2 million tokens, supporting the one-time processing of entire books or code libraries; the inference cost is reduced by 98%, and the consumption of inference tokens is reduced by 40% through optimized architecture. As the basic version of the Grok 4 series, it integrates text/image input, real-time web access (DeepSearch tool), and function call capabilities, targeting lightweight scenarios such as daily Q&A and document processing, and plans to gradually replace Grok 3 as the basic service for free users. While maintaining multimodal capabilities, this model is designed with efficiency as the priority to meet the needs of ordinary users.

Text generationMultilingual

GPT-5 Codex is a multi-model hybrid code generation system launched by OpenAI, which integrates high - efficiency basic models and deep reasoning modules, and dynamically schedules resources through intelligent routing. Its code generation ability has been significantly improved, enabling rapid construction of complex front - end applications and debugging of large - scale code libraries. It supports generating complete websites and games with a single prompt and performs better in design aesthetics processing. It is suitable for programming development, application construction, and code debugging scenarios. Free users can use it for basic functions, while the paid version offers higher limits and extended reasoning capabilities.

Text generationMultilingualTool Call

Claude 3 Opus is a top - tier large - scale model launched by Anthropic. It belongs to the high - end version of the Claude 3 series and has multimodal capabilities, supporting a context window of 200,000 Tokens. It features a leading intelligence level, outperforming its peers in benchmark tests such as MMLU and GPQA. It can deeply understand complex tasks and achieve human - like interactions. It is suitable for scenarios such as task automation (API/database operations), R & D (drug R & D, research review), and strategic analysis (financial trend prediction, chart interpretation).

Claude Haiku 4.5

Text generationMultilingualTool Call

Claude Haiku 4.5 is a small hybrid inference AI language model launched by Anthropic. Its performance is close to that of the medium-sized model Sonnet 4, and its cost is only one-third of it, with the inference speed more than doubled. It has the ability to process a context of 200,000 tokens, supports multimodal prompts, and has an AI security level of ASL-2. It is suitable for real-time response scenarios such as intelligent customer service, programming assistance, and conversational assistants, and can be integrated through the Claude application, API, and major cloud platforms.

Claude 3 Sonnet

Text generationMultilingual

Claude 3 Sonnet is a large language model launched by Anthropic. It is a mid - range model in the Claude 3 series, balancing ability and speed, and is suitable for enterprise - level applications. It is twice as fast as its predecessor, has high controllability, supports content generation, classification, data extraction, knowledge retrieval, etc., and is available on the API and Amazon Bedrock.

Gemini 2.5 Flash Lite

Text generationMultilingualTool Call

Gemini 2.5 Flash - Lite is a lightweight AI inference model (preview version) launched by Google, featuring ultra - fast response and cost optimization. It is the fastest Gemini model currently. It supports multimodal input, a 1 - million - token context, and Google's native tools (such as search and code execution). It is suitable for high - throughput, low - latency scenarios (such as translation and classification) and provides API services for developers.

Qwen3 Vl 235b A22b Thinking

Visual UnderstandingTool Call

Qwen3-VL-235B-A22B-Thinking is the flagship visual - language model of Alibaba Tongyi Qianwen Qwen3 series, which adopts the MoE architecture and has 235 billion parameters. It has GUI - level visual agent capabilities, supports OCR in 32 languages, has a 256K context (extendable to 1M), excels in video understanding and multimodal reasoning, and is suitable for complex multimodal workflows, long - document retrieval, and intelligent interaction scenarios.

Qwen3 Coder Plus

Text generationTool Call

Qwen3-Coder-Plus is an enhanced code generation model in Alibaba's Tongyi Qianwen series. It belongs to the 480B parameter Mixture of Experts (MoE) architecture, with 35 billion active parameters and a 1M context window. It features strong code understanding and generation capabilities, supports multiple languages and complex logical reasoning, and its performance is comparable to Claude Sonnet. It is suitable for intelligent agent programming tasks such as large project analysis and code library operations.

Text generationTool Call

Qwen3-Max is the most advanced large model in Alibaba's Qwen3 series. It has trillions of parameters, is pre - trained on 36T tokens, supports a context of over 260,000 tokens, covers multiple languages, and has an explicit reasoning mode. It is suitable for complex tasks such as enterprise - level policy Q&A, code review, and data analysis.

Visual understandingTool Call

Qwen3-VL-plus is an enhanced version of the visual language model launched by Tongyi Qianwen of Alibaba. It belongs to the Qwen3-VL series and offers Instruct and Thinking versions. It features high performance with a small number of parameters. The performance of the 8B parameters is approaching that of the previous generation's 72B flagship model. It supports images with a resolution of over one million pixels and enhances detailed recognition, text understanding, and complex visual reasoning. It is suitable for scenarios such as intelligent customer service, image recognition, content creation, and decision-making assistance.

Qwen Image Plus

Image generation

Qwen-image-plus is an image generation model in the Tongyi Qianwen series of Alibaba Cloud. It is a professional version of Qwen-Image, excelling in complex text rendering and supporting both Chinese and English, as well as multi-line layouts. It is suitable for scenarios requiring precise text generation, such as posters and couplets. It has a lower cost compared to the basic version and can be called through an API, balancing quality and efficiency.

Doubao Seed Translation

Text generation

Doubao-Seed-Translation is a large multilingual translation model launched by ByteDance's Volcengine. Based on the Transformer architecture, it supports mutual translation among 28 languages. It has high accuracy (BLEU score of 42.5) and fluency, and is suitable for general text translation scenarios such as cross-border e-commerce, international cooperation, and education and learning.

Qwen3 Livetranslate Flaltimeash Re 2025 09 22

Speech RecognitionMultilingual

Qwen3-LiveTranslate-Flash is a multilingual real-time audio and video simultaneous interpretation model launched by Tongyi Qianwen of Alibaba. It is based on the Qwen3-Omni foundation and trained by fusing multimodal data. It supports offline/real-time translation of 18 languages and dialects with a low latency of 3 seconds. The visual enhancement technology improves the accuracy in complex scenarios and outperforms mainstream models. It is suitable for scenarios such as international conferences, remote teaching, and cross-border collaboration.

Wan2.5 I2v Preview

Video generation

wan2.5-i2v-preview is an image-to-video model in the Tongyi Wanxiang 2.5 series of Alibaba. It belongs to the multi-modal generation model. It uses a unified framework to integrate the generation capabilities of text, images, videos, and audio. It supports 1080P high-definition video output, can achieve audio-visual synchronization, can understand camera movement language, maintain the consistency of element IDs, support audio-driven video generation, and is suitable for content creation in fields such as advertising, e-commerce, film and television, and education.

Qwen3 Omni Flash Realtime

Full modalityMultilingual

Qwen3-omni-flash-realtime is a real-time full-modal AI model launched by Tongyi Qianwen of Alibaba. It supports multimodal processing of text, images, audio, and video, and has real-time interaction capabilities such as streaming conversations and mid-way interruption. It can be applied to scenarios such as voice assistants, multimedia analysis, and intelligent editing, and supports 119 text languages and 20 voice interactions.

Qwen3 Tts Flash

Text-to-Speech SynthesisMultilingual

Qwen3-TTS-Flash is a text-to-speech model launched by Tongyi of Alibaba. It supports 10 languages, 17 voice timbres, and 9 Chinese dialects. It can intelligently adjust the tone, with a first-packet delay of 97ms. It is suitable for scenarios such as intelligent customer service, audio creation, and voice assistants.

Qwen3 Tts Flash Realtime

Speech synthesisMultilingual

Qwen3-TTS-Flash-Realtime is a real-time text-to-speech model launched by Tongyi of Alibaba. The first packet delay is 97ms. It supports 17 timbres, 10 languages, and 17 dialects. The speech is natural and fluent. It is suitable for scenarios such as intelligent customer service, audiobooks, AI teachers, and film and television dubbing.

Text generationTool Call

Kimi-K2 is an open-source large language model with a trillion parameters based on the MoE architecture, with 32 billion active parameters. It uses the MuonClip optimizer to achieve efficient training. Its features include outstanding code generation, tool invocation, and mathematical reasoning abilities. It is fast and supports the decomposition of complex tasks, and can generate visual data analysis reports. It is suitable for scenarios such as programming development, professional data analysis, and text creation.

Doubao SeedEdit 3.0 I2i

Image generationMultilingual

Doubao-SeedEdit-3.0-i2i is an image editing model that supports complex visual operations through natural language prompts, such as background removal, light adjustment, and pose change. It has the feature of using random number seeds to control the randomness of generation and is specifically designed for commercial use, applied in the fields of advertising, content creation, and e-commerce.

Qwen3 Asr Flash

Speech RecognitionMultilingual

Qwen3-ASR-Flash is a speech recognition model launched by Tongyi Qianwen of Alibaba. It supports 19 types of voice inputs (including 5 Chinese dialects) and 11 languages, and has the ability of low-latency streaming processing. It is suitable for scenarios such as voice assistants, subtitle generation, and multimodal conversations. The recognition error rate of Chinese and English is lower than that of GPT-4o-transcribe, and it provides 10 hours of free usage.

Visual understanding

Qwen-VL-Plus is a multi-modal model in the Alibaba Tongyi Qianwen Qwen2.5-VL series, focusing on visual language understanding. It enhances detail recognition and text processing, supports images with over one million pixels and any aspect ratio, and is suitable for scenarios such as professional document processing, high-precision recognition, and visual reasoning.

Image understandingChinese

Qianfan-VL-8B is a large multimodal visual understanding model launched by Baidu, belonging to the 8-billion parameter version of the Qianfan-VL series. It has three major features: it supports chain-of-thought reasoning and can handle complex chart understanding and mathematical problem-solving; it has outstanding OCR capabilities, accurately recognizing handwritten text, formulas, and complex layouts and extracting information in a structured manner; its lightweight design is suitable for enterprise-level deployment. It is applicable to scenarios such as educational homework correction, financial statement analysis, and intelligent document processing.

Image understandingChinese

Qianfan-VL-70B is a large vision-language model launched by Baidu Smart Cloud, which is a 70-billion parameter version optimized for enterprise-level multimodal applications. It has three major features: an ultra-long context window, supporting complex chart understanding and mathematical reasoning; enhanced OCR and document understanding, accurately recognizing handwritten text and complex layouts and extracting information in a structured manner; trained on the Kunlun Chip P800, capable of processing over 1 billion image data. It is suitable for scenarios such as financial chart analysis, educational math problem-solving, and intelligent enterprise document processing.

Hunyuan T1 20250822

Inference modelChinese, English

Hunyuan-T1-20250822 is Tencent's flagship inference model of Hunyuan, belonging to the text generation category. Features: Maximum input of 32K and output of 64K, improved ability in high - difficulty mathematics, logic, and code, optimized long - text processing and output stability. Suitable for scenarios such as text generation, creation, and Q&A.

Doubao Seed 1.6 Vision

Image understandingTool Call

Doubao-Seed-1.6-vision is a multimodal visual deep thinking model released by ByteDance. It supports a 256K context window and tool invocation, and can automatically call image processing tools such as rotation and zooming. It is suitable for scenarios such as video understanding, medical image analysis, and manufacturing quality inspection, and is available on the Doubao APP and Volcengine.

Hunyuan T1 Latest

Inference modelChinese, English

Hunyuan-T1-latest is a large deep inference model launched by Tencent in March 2025. It adopts the Hybrid-Transformer-Mamba MoE architecture and has a parameter scale in the trillions. It has super-strong long-text capture, mathematical/logical reasoning, and code generation capabilities. The decoding speed is 60 - 80 tokens/s. It supports API calls and is suitable for scenarios such as complex problem solving, scientific computing, and AI search.

Inference modelTool Call

DeepSeek-V3.1 is a large language model released by the Chinese AI company DeepSeek in August 2025. It adopts a hybrid inference architecture and a 671 billion parameter MoE design, supporting the switch between "thinking" and "non-thinking" dual modes, and unifying general dialogue, complex reasoning, and coding capabilities. Its agent capabilities are enhanced and can be used for tool usage, multi-step reasoning, and programming assistance. The API has been opened, and an MIT open-source license is provided, making it suitable for scenarios such as agent development and financial risk control.

Image generationMultilingual

Qwen-MT-Image is an image translation model in the Tongyi Qianwen series. It can accurately translate the text in images while preserving the original layout, and supports custom functions such as domain prompts, sensitive word filtering, and term intervention. It is suitable for scenarios such as localization of multilingual image content and cross - language graphic information processing.

Text generationMultilingual

Qwen3-1.7B is an open - source Dense model in Alibaba's Qwen3 series, with 1.7B parameters. It supports 119 languages and has a hybrid thinking mode (the inference process can be manually controlled to be enabled or not). It has low hardware requirements and is suitable for scenarios such as local testing and rapid scientific research experiments.

Tencent Hunyuan Video Generation

Video GenerationChinese, English

Tencent Hunyuan Video Generation is an AI video generation and processing technology service launched by Tencent. Based on multi-modal fusion technology, it supports functions such as video special effects, stylized conversion, and image dynamicization. Features include high-coherence motion generation and accurate semantic understanding. It is suitable for scenarios such as short video creation, advertising and marketing, and educational content production, which can lower the threshold for professional production and improve content production efficiency.

AIBase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2026AIBase